library("dplyr")
library("MCMCpack")
library("LaplacesDemon")
library("rstan")
library("plotrix")
library(RColorBrewer)
library(corrplot)
library(factoextra)

To Do:

The Task (How I understand it)

In this task, participants have to evaluate whether a certain quantifier logically describes a scenario. Scenarios are percentages of, for example, apples that are green. Say, 20% of apples are green, is the statement that “most of the apples are green” true or false? Percentages are drawn from a uniform ensuring that as many trials have more than 50% as have less than 50%. The quantifiers of interest are Few, Fewer than half, Many, Most and More than half. The idea is that More than half is defined by more than 50% for everyone while Most may be more subjective, and it may be more than More than half for everyone. Additionally, we may inspect the relationship between Many and More than half, and also symmetries between mirrored quantifiers.

Modeling Approach

I think we could use a probit regression with a participant-specific quantifier effect and a participant-specific effect of percentage used. This way we could at least assess whether Most is more than More than half by defining the quantifier effect as a contrast between the two. It could even be tested in this framework whether More than half is 50% for everyone.

Model

Disclaimer: I have tweaked the model yesterday to accommodate the added quantifier Many and the model comparison I planned. Therefore, this section probably needs to be updated slightly.

Let \(i\) indicate participants, \(i = 1, \ldots, I\), \(j\) indicate the quantifier, \(j = 1, \ldots, 5\), and \(k\) indicate the trial for each quantifier, \(k = 1, \ldots, K_{ij}\).1 Then \(Y_{ijk}\) is the \(i\)th participant’s response to the \(j\)th quantifier in the \(k\)th trial, and \(Y_{ijk} = 1\) if participants indicate true, and \(Y_{ijk} = 0\) if participant indicate false. Then, we may model \(Y_{ijk}\) as a Bernoulli, using the probit link function on the probabilities:

\[\begin{align*} Y_{ijk} &\sim \mbox{Bernoulli}(\pi_{ijk}),\\ \mu_{ijk} &= \Phi^{-1}(\pi_{ijk}),\\ \end{align*}\]

where the second line maps the probability space of \(\pi\) onto the real space of \(\mu\). We may now place a linear model on \(\mu_{ijk}\):

\[\mu_{ijk} = \beta_{i0} + l_{ijk}\beta_{i1} + h_{ijk}\beta_{i2} + z_{i1} \beta_{i3} + + z_{i2} \beta_{i4} + + z_{i3} \beta_{i5} + + z_{i4} \beta_{i6} + z_{i5} \beta_{i7},\]

where \(l_{ijk}\), \(h_{ijk}\) and \(z_{ij}\) are predictors: \(l_{ijk}\) is zero for percentages above 50%, and indicates the centered percentage otherwise; \(xh{ijk}\) is zero for percentages below 50%, and indicates the centered percentage otherwise, and \(z_{ij}\) is an indicator function denoting the respective quantifier (e.g., \(z_{i1} = 1\) for the quantifier Fewer, and \(z_{i1} = 0\) otherwise). Parameters \(\beta_{i0}\) are random intercepts, \(\beta_{i1}\) are random percentage effects for percentages below 50%, \(\beta_{i12}\) are random percentage effects for percentages above 50%, and \(\beta_{i3}\) to \(\beta_{i7}\) are random quantifier effects. For now, I will place the following priors:

\[\begin{align*} \beta_{i0} &\sim \mbox{Normal}(0, \sigma_0^2),\\ \beta_{i1} &\sim \mbox{Normal}(\delta_1, \sigma_1^2),\\ \beta_{i2} &\sim \mbox{Normal}(\delta_2, \sigma_2^2),\\ \beta_{i3} &\sim \mbox{Normal}(\delta_3, \sigma_3^2).\\ \beta_{i4} &\sim \mbox{Normal}(\delta_4, \sigma_4^2).\\ \beta_{i5} &\sim \mbox{Normal}(\delta_5, \sigma_5^2).\\ \beta_{i6} &\sim \mbox{Normal}(\delta_6, \sigma_6^2).\\ \beta_{i7} &\sim \mbox{Normal}(\delta_7, \sigma_7^2).\\ \end{align*}\]

Needed: Proper description of prior settings.

Simulating mistakes

Study 1

Data

##   workerid       A         B percent read_and_decide_time  quant read_time_one
## 1        0 "glerb"  "fizzda"      20                 3702  "All"           183
## 2        0 "thonk" "krangly"      82                  433 "Some"           264
## 3        0 "slarm" "briddle"      11                  287 "None"           216
## 4        0 "klong"   "nooty"      62                   16  "All"            48
## 5        0 "dring"   "larfy"      28                  384 "None"           113
## 6        0 "floom" "plerful"      92                  126 "Some"           104
##   response
## 1    false
## 2    false
## 3     true
## 4     true
## 5    false
## 6    false
## [1] 23220
## [1] 18318
## [1] 17233
##             "All"             "Few" "Fewer than half"            "Many" 
##                 2                48                48                50 
##  "More than half"            "Most"            "None"            "Some" 
##                50                50                 3                 1

## 
## FALSE  TRUE 
##  8477  8756
##        
##         "All" "Few" "Fewer than half" "Many" "More than half" "Most" "None"
##   false    89  2034              1798   1419             1724   1798    132
##   true     10  1277              1605   1943             1726   1606      9
##        
##         "Some"
##   false     13
##   true      50

I now recode the responses to correspond to the expected direction of response. I will therefore flip TRUE and FALSE responses for the quantifiers few and fewer than half.

Estimation

Model

data {
  int<lower=1> D;                     // #Dimensions of the model
  int<lower=0> N;                     // #Observations
  int<lower=1> I;                     // #Participants
  int<lower=0,upper=1> y[N];          // Data 0,1
  vector[N] cperc;                    // Centered Percentages
  int<lower=1,upper=I> sub[N];        // participant vector
  int<lower=0,upper=1> few[N];      // Few
  int<lower=0,upper=1> fewer[N];      // Fewer than half
  int<lower=0,upper=1> many[N];      // Many
  int<lower=0,upper=1> more[N];      // More than half
  int<lower=0,upper=1> most[N];      // Most
  int<lower=0,upper=1> above[N];      // Above 50 percent?
}

parameters {
  real delta[D];                      // Means of betas
  real<lower=0> sigma2[D];             // variance of betas
  vector[D] beta[I];                  // vectors of betas
  real nu[D];                      // Means of alphas
  real<lower=0> sigma2alpha[D];       // variance of alphas
  vector<lower=0>[D] alpha[I];                  // vectors of alphas
  vector<lower=0,upper=1>[D] gamma[I];     //vector of gammas
}

transformed parameters {
  real<lower=0> sigma[D];
  real<lower=0> sigmaalpha[D];
  sigma = sqrt(sigma2);
  sigmaalpha = sqrt(sigma2alpha);
}

model {
  vector[N] mu;
  vector[N] p;
  delta ~ normal(0, 5);
  sigma2 ~ inv_gamma(2, .2);
  nu ~ normal(0, 5);
  sigma2alpha ~ inv_gamma(2, .2);
  for (i in 1:I)
    beta[i] ~ normal(delta, sigma);
  for (i in 1:I)
    alpha[i] ~ lognormal(nu, sigmaalpha);
  for (i in 1:I)
    gamma[i] ~ beta(2, 20);
  for (n in 1:N)
    mu[n] = few[n] * (cperc[n] - beta[sub[n], 1]) / alpha[sub[n], 1] + fewer[n] * (cperc[n] - beta[sub[n], 2]) / alpha[sub[n], 2] + many[n] * (cperc[n] - beta[sub[n], 3]) / alpha[sub[n], 3] + more[n] * (cperc[n] - beta[sub[n], 4]) / alpha[sub[n], 4] + most[n] * (cperc[n] - beta[sub[n], 5]) / alpha[sub[n], 5];
  for (n in 1:N)
    p[n] = few[n] * gamma[sub[n], 1] + fewer[n] * gamma[sub[n], 2] + many[n] * gamma[sub[n], 3] + more[n] * gamma[sub[n], 4] + most[n] * gamma[sub[n], 5] + (1 - 2 * (few[n] * gamma[sub[n], 1] + fewer[n] * gamma[sub[n], 2] + many[n] * gamma[sub[n], 3] + more[n] * gamma[sub[n], 4] + most[n] * gamma[sub[n], 5])) * inv_logit(mu[n]);
  y ~ bernoulli(p);
}
  • Here is an initial check how well the chains mixed. This is a bit of a problem for the model. I may have to reparameterize to set one quantifier (more than half?) as default.

Results

  • The overall estimates of the response curves for the three quantifiers.
## ci_level: 0.8 (80% intervals)
## outer_level: 0.95 (95% intervals)

## ci_level: 0.8 (80% intervals)
## outer_level: 0.95 (95% intervals)

## ci_level: 0.8 (80% intervals)
## outer_level: 0.95 (95% intervals)

## ci_level: 0.8 (80% intervals)
## outer_level: 0.95 (95% intervals)

## ci_level: 0.8 (80% intervals)
## outer_level: 0.95 (95% intervals)

## ci_level: 0.8 (80% intervals)
## outer_level: 0.95 (95% intervals)

## ci_level: 0.8 (80% intervals)
## outer_level: 0.95 (95% intervals)

## [1] 10500  1086
pMeans <- colMeans(pEst) 

I <- predat$I
pdelta1 <- pMeans[1]
pdelta2 <- pMeans[2]
pdelta3 <- pMeans[3]
pdelta4 <- pMeans[4]
pdelta5 <- pMeans[5]
pnu1 <- pMeans[6]
pnu2 <- pMeans[7]
pnu3 <- pMeans[8]
pnu4 <- pMeans[9]
pnu5 <- pMeans[10]
pbeta1 <- pMeans[10 + 1:I]
pbeta2 <- pMeans[10 + I + 1:I]
pbeta3 <- pMeans[10 + 2 * I + 1:I]
pbeta4 <- pMeans[10 + 3 * I + 1:I]
pbeta5 <- pMeans[10 + 4 * I + 1:I]
palpha1 <- pMeans[10 + 5 * I + 1:I]
palpha2 <- pMeans[10 + 6 * I + 1:I]
palpha3 <- pMeans[10 + 7 * I + 1:I]
palpha4 <- pMeans[10 + 8 * I + 1:I]
palpha5 <- pMeans[10 + 9 * I + 1:I]
pgamma1 <- pMeans[10 + 10 * I + 1:I]
pgamma2 <- pMeans[10 + 11 * I + 1:I]
pgamma3 <- pMeans[10 + 12 * I + 1:I]
pgamma4 <- pMeans[10 + 13 * I + 1:I]
pgamma5 <- pMeans[10 + 14 * I + 1:I]

cperc <- (1:100 - 50)/100
curve.calc <- function(x, p = cperc){
  a <- (p - x[1])/x[2]
  x[3]  + (1 - 2* x[3]) * exp(a)/(exp(a) + 1)
}  

pps1 <- curve.calc(c(pdelta1, exp(pnu1), mean(pgamma1)))
pps2 <- curve.calc(c(pdelta2, exp(pnu2), mean(pgamma2)))
pps3 <- curve.calc(c(pdelta3, exp(pnu3), mean(pgamma3)))
pps4 <- curve.calc(c(pdelta4, exp(pnu4), mean(pgamma4)))
pps5 <- curve.calc(c(pdelta5, exp(pnu5), mean(pgamma5)))

layout(matrix(1:6, ncol = 2, byrow = T))

plot(1:100, pps1, type = "l", lwd = 2, col = qcols[1], ylim = c(0,1)
     , ylab = "Probability 'true' response", xlab = "Percent")
lines(1:100, pps2, lwd = 2, col = qcols[2])
lines(1:100, pps3, lwd = 2, col = qcols[3])
lines(1:100, pps4, lwd = 2, col = qcols[4])
lines(1:100, pps5, lwd = 2, col = qcols[5])
abline(h = .5, col = adjustcolor(1, .5))
abline(v = 50, col = adjustcolor(1, .5))
legend("bottomright", legend = c("Few", "Fewer than half", "Many", "More than half", "Most")
       , fill = qcols, bty = "n")

res1 <- apply(cbind(pbeta1, palpha1, pgamma1), 1, curve.calc)
matplot(res1, type = "l", lty = 1, col = qcols[1], main =  "Few")
res2 <- apply(cbind(pbeta2, palpha2, pgamma2), 1, curve.calc)
matplot(res2, type = "l", lty = 1, col = qcols[2], main =  "Fewer than half")
res3 <- apply(cbind(pbeta3, palpha3, pgamma3), 1, curve.calc)
matplot(res3, type = "l", lty = 1, col = qcols[3], main =  "Many")
res4 <- apply(cbind(pbeta4, palpha4, pgamma4), 1, curve.calc)
matplot(res4, type = "l", lty = 1, col = qcols[4], main = "More than half")
res5 <- apply(cbind(pbeta5, palpha5, pgamma5), 1, curve.calc)
matplot(res5, type = "l", lty = 1, col = qcols[5], main =  "Most")

  • Individual estimates of response curves. There is quite a bit of variability for Many, even more than for Most. Roughly speaking, the ordering is Many, More than half, Most. Few has the most shallow response curve.
layout(matrix(c(0, 1,1, 2,2, 0, 3,3,4,4,5,5), nrow = 2, byrow = T))
par(mgp = c(2, .7, 0), mar = c(3,3,1,1))

matplot(1:100, res1, type = "l", lty = 1, main = "Few", ylim = c(0,1), col = adjustcolor(1, .1)
        , ylab = "Probability 'true' response", xlab = "Percent", frame.plot = F)
abline(h = .5, col = adjustcolor(1, .5))
abline(v = 50, col = adjustcolor(1, .5))
lines(1:100, pps1, lwd = 3, col = qcols[1])

matplot(1:100, res2, type = "l", lty = 1, main = "Fewer than half", ylim = c(0,1), col = adjustcolor(1, .1)
        , ylab = "Probability 'true' response", xlab = "Percent", frame.plot = F)
abline(h = .5, col = adjustcolor(1, .5))
abline(v = 50, col = adjustcolor(1, .5))
lines(1:100, pps2, lwd = 3, col = qcols[2])

matplot(1:100, res3, type = "l", lty = 1, main = "Many", ylim = c(0,1), col = adjustcolor(1, .1)
        , ylab = "Probability 'true' response", xlab = "Percent", frame.plot = F)
abline(h = .5, col = adjustcolor(1, .5))
abline(v = 50, col = adjustcolor(1, .5))
lines(1:100, pps3, lwd = 3, col = qcols[3])

matplot(1:100, res4, type = "l", lty = 1, main = "More than half", ylim = c(0,1), col = adjustcolor(1, .1)
        , ylab = "Probability 'true' response", xlab = "Percent", frame.plot = F)
abline(h = .5, col = adjustcolor(1, .5))
abline(v = 50, col = adjustcolor(1, .5))
lines(1:100, pps4, lwd = 3, col = qcols[4])

matplot(1:100, res5, type = "l", lty = 1, main = "Most", ylim = c(0,1), col = adjustcolor(1, .1)
        , ylab = "Probability 'true' response", xlab = "Percent", frame.plot = F)
abline(h = .5, col = adjustcolor(1, .5))
abline(v = 50, col = adjustcolor(1, .5))
lines(1:100, pps5, lwd = 3, col = qcols[5])

  • Now for the thresholds.
##               few       fewer        many        more       most
## few    1.00000000 -0.11875910  0.30317693  0.00064865 -0.2267883
## fewer -0.11875910  1.00000000 -0.08713616 -0.20661865  0.1195840
## many   0.30317693 -0.08713616  1.00000000  0.06969840  0.1719174
## more   0.00064865 -0.20661865  0.06969840  1.00000000 -0.1694777
## most  -0.22678827  0.11958399  0.17191740 -0.16947768  1.0000000

##               few       fewer        many        more       most
## few    1.00000000  0.05281385  0.07725865 -0.12947064 0.09840626
## fewer  0.05281385  1.00000000 -0.09618068  0.04532783 0.08532919
## many   0.07725865 -0.09618068  1.00000000  0.10239056 0.25590820
## more  -0.12947064  0.04532783  0.10239056  1.00000000 0.26718043
## most   0.09840626  0.08532919  0.25590820  0.26718043 1.00000000

##             few     fewer      many      more      most
## few   1.0000000 0.7542331 0.4082249 0.4606126 0.6538767
## fewer 0.7542331 1.0000000 0.4216081 0.3683092 0.5726053
## many  0.4082249 0.4216081 1.0000000 0.2359867 0.5168501
## more  0.4606126 0.3683092 0.2359867 1.0000000 0.3682729
## most  0.6538767 0.5726053 0.5168501 0.3682729 1.0000000

Comparing “More than half” and “Most”

Bayes factor assessing whether More is more than More than half for everyone.

Comparing “More than half” and “Many”

Bayes factor assessing whether Many is less than More than half for everyone.

## [1] 0

Additional Analyses

Clustering

As discussed in our meetings, we wanted to do a cluster analysis on the parameters from the probit model to assess whether there are distinct groups of individuals with certain parameter patterns. For example, we hypothesized that there might be a group of individuals who interpret the quantifier Most as More than half and therefore have a threshold close to 50% (or 0 in the current parameterization). Here, we do an extensive cluster analysis on 1. the parameters corresponding to the quantifier Most, 2. the parameters corresponding to the quantifier Many, and 3. an additional analysis on all parameter from all models.

Cluster Analysis for Quantifier Most

The threshold, vagueness and guessing parameters for all participants were submitted to the cluster analysis. A determination of the optimal number of clusters consistently favored \(K = 2\) clusters.

Determining the optimal number of clusters.

Determining the optimal number of clusters.

Determining the optimal number of clusters.

Determining the optimal number of clusters.

Determining the optimal number of clusters.

Determining the optimal number of clusters.

The figure below shows the response curves based on the cluster centers as parameters. As can be seen there is a clear shift in threshold between the two clusters, one perfectly centered at 50%, and one much higher. This is consistent with our hypothesis that there might be some individuals who interpret Most as more than half, and others interpret it as more than more than half.

Next, we investigated individual differences for all quantifiers based on the cluster analysis. The plots for the quantifier Most illustrate that the threshold is consistently and clear-cut associated with cluster affiliation. There are no additional consistencies for the other quantifiers based on this cluster analysis.

Cluster Analysis for Quantifier Many

The threshold, vagueness and guessing parameters for all participants were submitted to the cluster analysis. A determination of the optimal number of clusters was inconsistent with either \(K = 1\) or \(K = 2\) favored. We therefore ran a cluster analysis with \(K = 2\) clusters.

Determining the optimal number of clusters.

Determining the optimal number of clusters.

Determining the optimal number of clusters.

Determining the optimal number of clusters.

Determining the optimal number of clusters.

Determining the optimal number of clusters.

The figure below shows the response curves based on the cluster centers as parameters. As can be seen there is a clear shift in threshold between the two clusters, one perfectly centered at 50%, and one much lower. This is consistent with our hypothesis that there might be some individuals who interpret Many as more than half, and others interpret it as less than more than half.

Next, we investigated individual differences for all quantifiers based on the cluster analysis for quantifier Many. The plots for the quantifier Many illustrate that the threshold is consistently and clear-cut associated with cluster affiliation. In addition, individuals from cluster 1 — the cluster with thresholds lower than 50% — have the tendency for an increased vagueness parameter as well. This seems plausible: If the quantifier Many is interpreted as more than half, as seems to be the case for cluster 2, then there should be reduced vaguenuess associated with this quantifier.

In the second plot we assess whether the clustering on quantifier Many has a relationship with potential clustering for quantifier Few which is often thought of as the mirror quantifier of Many. However, the results are mostly inconsistent. There seems to be a small tendency that cluster 1 has a lower threshold on average than cluster 2, which seems plausible. However, this trend is mild at best. There are no other consistencies from this cluster analysis for the other quantifiers. Interestingly, the individuals who interpret Many as more than half do not seem to be the same individuals who interpret Most as more than half.

Cluster Analysis with Combining Many and Most

The threshold, vagueness and guessing parameters for all participants for quantifiers Most and Many were submitted to the cluster analysis. A determination of the optimal number of clusters was inconsistent with either \(K = 1\) or \(K = 6\) favored. From a theoretical perspective given the two previous analyses we expected \(K = 4\) clusters, and therefore picked this number.

Determining the optimal number of clusters.

Determining the optimal number of clusters.

Determining the optimal number of clusters.

Determining the optimal number of clusters.

Determining the optimal number of clusters.

Determining the optimal number of clusters.

The figure below shows the response curves for Many and Most based on the cluster centers as parameters. As can be seen there is a clear shift in threshold between the four clusters for Many, and between most of the clusters for Most. It almost seems like the 50%cluster for Many breaks down more, but the 50% cluster for Most is quite consistent.

Cluster Analysis with all parameters \(K = 4\)

Looking at individual differences for all quantifiers based on cluster analysis with all quantifiers

Cluster Analysis with all parameters K = 2

Looking at individual differences for all quantifiers based on cluster analysis with all quantifiers

Simulation

Simple rule-based responding by setting a fixed boundary. The lines represent different levels of perceptual noise. Where this source of noise would come from in our experiment is completely unclear to me.

Negation adds error/noise to the probability of responding correctly. This can either be uniform (makes most sense to me) or a function of the distance from the boundary.

I had a bit of an issue finding a good way of adding noise as a function of distance from boundary. Have to play a bit more.

Vague quantifier implies sampling the boundary from a distribution for each trial new.

Study 2

Data

I now recode the responses to correspond to the expected direction of response. I will therefore flip TRUE and FALSE responses for the quantifiers few and fewer than half.

## [1] 16930

Estimation

Model

Results

  • The overall estimates of the response curves for the three quantifiers.
## ci_level: 0.8 (80% intervals)
## outer_level: 0.95 (95% intervals)

## ci_level: 0.8 (80% intervals)
## outer_level: 0.95 (95% intervals)

## ci_level: 0.8 (80% intervals)
## outer_level: 0.95 (95% intervals)

## ci_level: 0.8 (80% intervals)
## outer_level: 0.95 (95% intervals)

## ci_level: 0.8 (80% intervals)
## outer_level: 0.95 (95% intervals)

## ci_level: 0.8 (80% intervals)
## outer_level: 0.95 (95% intervals)

## ci_level: 0.8 (80% intervals)
## outer_level: 0.95 (95% intervals)

## [1] 6900 1191
pMeans <- colMeans(pEst) 

I <- predat$I
pdelta1 <- pMeans[1]
pdelta2 <- pMeans[2]
pdelta3 <- pMeans[3]
pdelta4 <- pMeans[4]
pdelta5 <- pMeans[5]
pnu1 <- pMeans[6]
pnu2 <- pMeans[7]
pnu3 <- pMeans[8]
pnu4 <- pMeans[9]
pnu5 <- pMeans[10]
pbeta1 <- pMeans[10 + 1:I]
pbeta2 <- pMeans[10 + I + 1:I]
pbeta3 <- pMeans[10 + 2 * I + 1:I]
pbeta4 <- pMeans[10 + 3 * I + 1:I]
pbeta5 <- pMeans[10 + 4 * I + 1:I]
palpha1 <- pMeans[10 + 5 * I + 1:I]
palpha2 <- pMeans[10 + 6 * I + 1:I]
palpha3 <- pMeans[10 + 7 * I + 1:I]
palpha4 <- pMeans[10 + 8 * I + 1:I]
palpha5 <- pMeans[10 + 9 * I + 1:I]
pgamma1 <- pMeans[10 + 10 * I + 1:I]
pgamma2 <- pMeans[10 + 11 * I + 1:I]
pgamma3 <- pMeans[10 + 12 * I + 1:I]
pgamma4 <- pMeans[10 + 13 * I + 1:I]
pgamma5 <- pMeans[10 + 14 * I + 1:I]

cperc <- (1:100 - 50)/100
curve.calc <- function(x, p = cperc){
  a <- (p - x[1])/x[2]
  x[3]  + (1 - 2* x[3]) * exp(a)/(exp(a) + 1)
}  

pps1 <- curve.calc(c(pdelta1, exp(pnu1), mean(pgamma1)))
pps2 <- curve.calc(c(pdelta2, exp(pnu2), mean(pgamma2)))
pps3 <- curve.calc(c(pdelta3, exp(pnu3), mean(pgamma3)))
pps4 <- curve.calc(c(pdelta4, exp(pnu4), mean(pgamma4)))
pps5 <- curve.calc(c(pdelta5, exp(pnu5), mean(pgamma5)))

layout(matrix(1:6, ncol = 2, byrow = T))

plot(1:100, pps1, type = "l", lwd = 2, col = qcols[1], ylim = c(0,1)
     , ylab = "Probability 'true' response", xlab = "Percent")
lines(1:100, pps2, lwd = 2, col = qcols[2])
lines(1:100, pps3, lwd = 2, col = qcols[3])
lines(1:100, pps4, lwd = 2, col = qcols[4])
lines(1:100, pps5, lwd = 2, col = qcols[5])
abline(h = .5, col = adjustcolor(1, .5))
abline(v = 50, col = adjustcolor(1, .5))
legend("bottomright", legend = c("Few", "Fewer than half", "Many", "More than half", "Most")
       , fill = qcols, bty = "n")

res1 <- apply(cbind(pbeta1, palpha1, pgamma1), 1, curve.calc)
matplot(res1, type = "l", lty = 1, col = qcols[1], main =  "Few")
res2 <- apply(cbind(pbeta2, palpha2, pgamma2), 1, curve.calc)
matplot(res2, type = "l", lty = 1, col = qcols[2], main =  "Fewer than half")
res3 <- apply(cbind(pbeta3, palpha3, pgamma3), 1, curve.calc)
matplot(res3, type = "l", lty = 1, col = qcols[3], main =  "Many")
res4 <- apply(cbind(pbeta4, palpha4, pgamma4), 1, curve.calc)
matplot(res4, type = "l", lty = 1, col = qcols[4], main = "More than half")
res5 <- apply(cbind(pbeta5, palpha5, pgamma5), 1, curve.calc)
matplot(res5, type = "l", lty = 1, col = qcols[5], main =  "Most")

  • Individual estimates of response curves. There is quite a bit of variability for Many, even more than for Most. Roughly speaking, the ordering is Many, More than half, Most. Few has the most shallow response curve.
layout(matrix(c(0, 1,1, 2,2, 0, 3,3,4,4,5,5), nrow = 2, byrow = T))
par(mgp = c(2, .7, 0), mar = c(3,3,1,1))

matplot(1:100, res1, type = "l", lty = 1, main = "Few", ylim = c(0,1), col = adjustcolor(1, .1)
        , ylab = "Probability 'true' response", xlab = "Percent", frame.plot = F)
abline(h = .5, col = adjustcolor(1, .5))
abline(v = 50, col = adjustcolor(1, .5))
lines(1:100, pps1, lwd = 3, col = qcols[1])

matplot(1:100, res2, type = "l", lty = 1, main = "Fewer than half", ylim = c(0,1), col = adjustcolor(1, .1)
        , ylab = "Probability 'true' response", xlab = "Percent", frame.plot = F)
abline(h = .5, col = adjustcolor(1, .5))
abline(v = 50, col = adjustcolor(1, .5))
lines(1:100, pps2, lwd = 3, col = qcols[2])

matplot(1:100, res3, type = "l", lty = 1, main = "Many", ylim = c(0,1), col = adjustcolor(1, .1)
        , ylab = "Probability 'true' response", xlab = "Percent", frame.plot = F)
abline(h = .5, col = adjustcolor(1, .5))
abline(v = 50, col = adjustcolor(1, .5))
lines(1:100, pps3, lwd = 3, col = qcols[3])

matplot(1:100, res4, type = "l", lty = 1, main = "More than half", ylim = c(0,1), col = adjustcolor(1, .1)
        , ylab = "Probability 'true' response", xlab = "Percent", frame.plot = F)
abline(h = .5, col = adjustcolor(1, .5))
abline(v = 50, col = adjustcolor(1, .5))
lines(1:100, pps4, lwd = 3, col = qcols[4])

matplot(1:100, res5, type = "l", lty = 1, main = "Most", ylim = c(0,1), col = adjustcolor(1, .1)
        , ylab = "Probability 'true' response", xlab = "Percent", frame.plot = F)
abline(h = .5, col = adjustcolor(1, .5))
abline(v = 50, col = adjustcolor(1, .5))
lines(1:100, pps5, lwd = 3, col = qcols[5])

  • Now for the thresholds.
##               few      fewer        many        more        most
## few    1.00000000 -0.1838015  0.19901978  0.04755858  0.06710636
## fewer -0.18380149  1.0000000 -0.15247098  0.14619987  0.12243073
## many   0.19901978 -0.1524710  1.00000000 -0.09941862 -0.10257846
## more   0.04755858  0.1461999 -0.09941862  1.00000000  0.04351189
## most   0.06710636  0.1224307 -0.10257846  0.04351189  1.00000000

##                few        fewer         many         more        most
## few    1.000000000 -0.006115202 -0.035453228 -0.001782599 -0.01184313
## fewer -0.006115202  1.000000000  0.001350216 -0.025687059 -0.03406211
## many  -0.035453228  0.001350216  1.000000000  0.047463519  0.04414786
## more  -0.001782599 -0.025687059  0.047463519  1.000000000 -0.02992121
## most  -0.011843131 -0.034062108  0.044147862 -0.029921206  1.00000000

##               few       fewer        many        more        most
## few    1.00000000 -0.02770359  0.05354589 -0.04673417 -0.03695528
## fewer -0.02770359  1.00000000 -0.08732881  0.15346394 -0.07835742
## many   0.05354589 -0.08732881  1.00000000 -0.12645892 -0.04835597
## more  -0.04673417  0.15346394 -0.12645892  1.00000000 -0.06687674
## most  -0.03695528 -0.07835742 -0.04835597 -0.06687674  1.00000000

Comparing “More than half” and “Most”

Bayes factor assessing whether More is more than More than half for everyone.

Comparing “More than half” and “Many”

Bayes factor assessing whether Many is less than More than half for everyone.

## [1] 0

Additional Analyses

Clustering

As discussed in our meetings, we wanted to do a cluster analysis on the parameters from the probit model to assess whether there are distinct groups of individuals with certain parameter patterns. For example, we hypothesized that there might be a group of individuals who interpret the quantifier Most as More than half and therefore have a threshold close to 50% (or 0 in the current parameterization). Here, we do an extensive cluster analysis on 1. the parameters corresponding to the quantifier Most, 2. the parameters corresponding to the quantifier Many, and 3. an additional analysis on all parameter from all models.

Cluster Analysis for Quantifier Most

The threshold, vagueness and guessing parameters for all participants were submitted to the cluster analysis. A determination of the optimal number of clusters consistently favored \(K = 2\) clusters.

Determining the optimal number of clusters.

Determining the optimal number of clusters.

Determining the optimal number of clusters.

Determining the optimal number of clusters.

Determining the optimal number of clusters.

Determining the optimal number of clusters.

The figure below shows the response curves based on the cluster centers as parameters. As can be seen there is a clear shift in threshold between the two clusters, one perfectly centered at 50%, and one much higher. This is consistent with our hypothesis that there might be some individuals who interpret Most as more than half, and others interpret it as more than more than half.

Next, we investigated individual differences for all quantifiers based on the cluster analysis. The plots for the quantifier Most illustrate that the threshold is consistently and clear-cut associated with cluster affiliation. There are no additional consistencies for the other quantifiers based on this cluster analysis.

Cluster Analysis for Quantifier Many

The threshold, vagueness and guessing parameters for all participants were submitted to the cluster analysis. A determination of the optimal number of clusters was inconsistent with either \(K = 1\) or \(K = 2\) favored. We therefore ran a cluster analysis with \(K = 2\) clusters.

Determining the optimal number of clusters.

Determining the optimal number of clusters.

Determining the optimal number of clusters.

Determining the optimal number of clusters.

Determining the optimal number of clusters.

Determining the optimal number of clusters.

The figure below shows the response curves based on the cluster centers as parameters. As can be seen there is a clear shift in threshold between the two clusters, one perfectly centered at 50%, and one much lower. This is consistent with our hypothesis that there might be some individuals who interpret Many as more than half, and others interpret it as less than more than half.

Next, we investigated individual differences for all quantifiers based on the cluster analysis for quantifier Many. The plots for the quantifier Many illustrate that the threshold is consistently and clear-cut associated with cluster affiliation. In addition, individuals from cluster 1 — the cluster with thresholds lower than 50% — have the tendency for an increased vagueness parameter as well. This seems plausible: If the quantifier Many is interpreted as more than half, as seems to be the case for cluster 2, then there should be reduced vaguenuess associated with this quantifier.

In the second plot we assess whether the clustering on quantifier Many has a relationship with potential clustering for quantifier Few which is often thought of as the mirror quantifier of Many. However, the results are mostly inconsistent. There seems to be a small tendency that cluster 1 has a lower threshold on average than cluster 2, which seems plausible. However, this trend is mild at best. There are no other consistencies from this cluster analysis for the other quantifiers. Interestingly, the individuals who interpret Many as more than half do not seem to be the same individuals who interpret Most as more than half.

Cluster Analysis with Combining Many and Most

The threshold, vagueness and guessing parameters for all participants for quantifiers Most and Many were submitted to the cluster analysis. A determination of the optimal number of clusters was inconsistent with either \(K = 1\) or \(K = 6\) favored. From a theoretical perspective given the two previous analyses we expected \(K = 4\) clusters, and therefore picked this number.

Determining the optimal number of clusters.

Determining the optimal number of clusters.

Determining the optimal number of clusters.

Determining the optimal number of clusters.

Determining the optimal number of clusters.

Determining the optimal number of clusters.

The figure below shows the response curves for Many and Most based on the cluster centers as parameters. As can be seen there is a clear shift in threshold between the four clusters for Many, and between most of the clusters for Most. It almost seems like the 50%cluster for Many breaks down more, but the 50% cluster for Most is quite consistent.

Cluster Analysis with all parameters \(K = 4\)

Looking at individual differences for all quantifiers based on cluster analysis with all quantifiers

Cluster Analysis with all parameters K = 2

Looking at individual differences for all quantifiers based on cluster analysis with all quantifiers


  1. Originally, \(K_{ij} = 50\). But this value may be reduced after cleaning on the trial level.